-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 #52700
Conversation
Co-authored-by: Timber-Ye <ye_hanqiao@163.com> Co-authored-by: BrianQian1999 <brianqianhitsz@gmail.com>
update expand_as_perf
你的PR提交成功,感谢你对开源项目的贡献! |
Hi, we need TPM approval to pass the static test, so to complete the CI. @jzhang533 @wanglun @lileding @Superjomn |
self.assertTrue( | ||
out_purefp16_fp32.dtype == fluid.core.VarDesc.VarType.FP32 | ||
) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分注释掉的原因以及影响?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的assertion是因为原先的expand_as算子不支持fp16类型,我们添加了fp16支持后即可去掉此处的断言;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块如果注册fp16报错应该找到原因并进行修改,而不是直接把报错全部注释掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经复原,不需要注册fp16了
@Timber-Ye @BrianQian1999 经过黑客松组委会讨论:
|
Done. |
#include "paddle/phi/kernels/impl/expand_as_grad_kernel_impl.h" | ||
#include "paddle/phi/kernels/funcs/reduce_function.h" | ||
|
||
#define MAX_RANK_SUPPORTED 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
尽量避免用宏替换吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
PADDLE_ENFORCE_LE( | ||
out_rank, | ||
MAX_RANK_SUPPORTED, | ||
errors::InvalidArgument("The rank of the input 'Out@GRAD' for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
读起来不太通顺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问哪里不通顺呢?这里直接借鉴的是文件expand_grad_kernel_impl.h L94-L101。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…addle#52700) * Implement optimized kernel for OP-expand_as. * Support fp16. Co-authored-by: Timber-Ye <ye_hanqiao@163.com> Co-authored-by: BrianQian1999 <brianqianhitsz@gmail.com> * remove fp16 support * remove MAX_RANK_SUPPORTED --------- Co-authored-by: BrianQian1999 <brianqianhitsz@gmail.com>
hi, @Timber-Ye
|
PR types
Performance optimization
PR changes
OPs
Describe
目前 Paddle 内
expand_as
前向和反向算子的 GPU 实现采用 Eigen 组合的模式,缺少 GPU Kernel,性能相对不足,希望实现高性能的 GPU 计算 Kernel,为 Paddle 优化expand_as
op 在 GPU 上的计算性能。【算子性能优化设计文档】
由于expand_as前向的过程与广播机制类似,后向的过程与求和约归类似,因此直接通过使用飞桨内部的
BroadcastKernel
和ReduceKernel
来对expand_as算子进行优化。完成优化后,Paddle(Optimized)与优化前的Paddle(Baseline)的性能对比:
针对以上9种不同的case, 优化后的性能有所提升,并且要扩展的Tensor元素数量越多,性能提升越明显,优化后的算子在case 8上的用时更是直接缩短至baseline的1/814。
原始PR: